Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prevent log message truncation #6554

Merged
merged 5 commits into from
Jan 14, 2025
Merged

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Jan 14, 2025

This fixes #6545 by falling back to a malloc'd buffer if truncation is detected when handling log messages.

Since the internal fixed buffer size is no longer relevant, the definition of FLUX_MAX_LOGBUF is also removed from the public interface. However, I'm fine dropping that commit if it goes too far.

Fixes #6545.

Comment on lines 123 to 140
/* Copy va_list ap in case it reused on truncation below:
*/
va_copy (ap_copy, ap);

if (vsnprintf (result, sizeof (buf), fmt, ap) >= sizeof (buf)) {
/* Message truncation occurred. Fall back to malloc. If that
* fails, just print the truncated message:
*/
if (vasprintf (&mbuf, fmt, ap_copy) == 0)
result = mbuf;
}
rc = fprintf (stderr, "%s: %s\n", lstr, result);
free (mbuf);
return rc < 0 ? -1 : 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

va_copy(3) says

Each invocation of va_copy() must be matched by a corresponding invocation
of va_end() in the same function.

Add one after the conditional block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'oh! Thanks for catching this one!

Comment on lines +173 to +180
/* Copy va_list ap in case it reused on truncation below:
*/
va_copy (ap_copy, ap);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here

* If this fails for any reason, fall back to truncated output.
*/
if (n >= sizeof (ctx->buf)) {
int size = n++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side effect on n seems not to have a purpose?

@@ -66,19 +66,14 @@ static int forward_logbuf (flux_t *h,
struct stdlog_header hdr;
const char *txt;
size_t txtlen;
char buf[FLUX_MAX_LOGBUF];
int len = strlen (stdlog);
char buf[len + 1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is unbounded, are we concerned about blowing the stack if the system conjurs a really big log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, that case should be handled. I'm not sure if it would just be easier to just use the heap here, since this is a limited case. Otherwise, the VLA size could be replaced with a macro that takes the min of len + 1 and some sane maximum (4K?) to avoid blowing up the stack?

Copy link
Member

@garlick garlick Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably heap? Otherwise the sane max just becomes the new FLUX_MAX_LOGBUF (albeit just in this case!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fixed that and addressed the other comments. Thanks!

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nothing major in my comments so I'll approve and let you address whatever you think is worthwhile.

Problem: Flux log messages may be truncated because all messages
are formated into an internal fixed size buffer of FLUX_MAX_LOGBUF
(2048) bytes. On large clusters, this can result in regular loss of
important information in the logs.

Attempt to allocate a large enough buffer for the full log message
on the heap when truncation is detected in flux_vlog(3). Do this
for both the case where the log destination is stderr (h == NULL),
or the message is being forwarded to a handler or via an RPC.
Problem: There is no test of logging a large message to stderr in
the libflux log unit tests.

Add a test that logs a message > FLUX_MAX_LOGBUF.

Note: Since the log message goes to stderr, i.e. the output of the
unit test itself, we can't ensure the message was not truncated. However,
this is still useful to ensure the test does not die, and that no leaks
are present (i.e. by running the test under valgrind)
Problem: No tests in the testsuite ensure that Flux log messages are
not truncated at FLUX_MAX_LOGBUF (2048) bytes.

Add a test to t0009-dmesg.t that logs a message much larger than
2048 bytes and ensures the result is not truncated.
Problem: In the broker shutdown code, log messages may be
truncated because the log forwarding code uses a static buffer of
FLUX_MAX_LOGBUF.

Since a maximum size of the final log message is known, use a
variable length array for the buffer in forward_logbuf() instead
of FLUX_MAX_LOGBUF to avoid any truncation.
Problem: libflux/flog.h exposes FLUX_MAX_LOGBUF via the public API,
but there are currently no users. Furthermore, the implementation
now falls back to a heap-allocated buffer if the internal fixed-size
buffer causes log message truncation, so exposing the internal buffer
size is misleading.

Move the definition of FLUX_MAX_LOGBUF out of the public API.
@grondo
Copy link
Contributor Author

grondo commented Jan 14, 2025

Thanks! Setting MWP now.

@mergify mergify bot merged commit 8ffa947 into flux-framework:master Jan 14, 2025
35 checks passed
Copy link

codecov bot commented Jan 14, 2025

Codecov Report

Attention: Patch coverage is 92.68293% with 3 lines in your changes missing coverage. Please review.

Project coverage is 79.44%. Comparing base (6326cbb) to head (014112d).
Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
src/common/libflux/flog.c 90.90% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6554   +/-   ##
=======================================
  Coverage   79.44%   79.44%           
=======================================
  Files         531      531           
  Lines       88196    88224   +28     
=======================================
+ Hits        70063    70092   +29     
+ Misses      18133    18132    -1     
Files with missing lines Coverage Δ
src/broker/shutdown.c 82.30% <100.00%> (+0.56%) ⬆️
src/common/libflux/flog.c 85.50% <90.90%> (+0.41%) ⬆️

... and 6 files with indirect coverage changes

@grondo grondo deleted the issue#6545 branch January 14, 2025 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

important log messages may be truncated
2 participants